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Abstract 

It is a classical result of Stein and Waterman that the asymptotic number of RNA 
secondary structures is 1.104366 • n~ 3 / 2 • 2.618034™. In this paper, we study combinato- 
rial asymptotics for two special subclasses of RNA secondary structures - canonical and 
saturated structures. Canonical secondary structures are defined to have no lonely (iso- 
lated) base pairs. This class of secondary structures was introduced by Bompfimewerer et 
al. , who noted that the run time of Vienna RNA Package is substantially reduced when 
restricting computations to canonical structures. Here we provide an explanation for the 
speed-up, by proving that the asymptotic number of canonical RNA secondary structures 
is 2.1614 ■ n -3 / 2 ■ 1.96798™ and that the expected number of base pairs in a canonical sec- 
ondary structure is 0.31724 ■ n. The asymptotic number of canonical secondary structures 
was obtained much earlier by Hofackcr, Schuster and Stadler using a different method. 

Saturated secondary structures have the property that no base pairs can be added 
without violating the definition of secondary structure (i.e. introducing a pseudoknot 
or base triple). Here we show that the asymptotic number of saturated structures is 
1. 07427 -7i~ 3 / 2 - 2. 35467™, the asymptotic expected number of base pairs is 0.337361 -n, and 
the asymptotic number of saturated stem-loop structures is 0.323954-1.69562™, in contrast 
to the number 2™ -2 of (arbitrary) stem-loop structures as classically computed by Stein 
and Waterman. Finally, we apply work of Drmota [SJ [S] to show that the density of states 
for [all resp. canonical resp. saturated] secondary structures is asymptotically Gaussian. 
We introduce a stochastic greedy method to sample random saturated structures, called 
quasi-random saturated structures, and show that the expected number of base pairs of is 
0.340633 • n. 
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1 Introduction 



Imagine an undirected^] graph, described by placing graph vertices 1, . . . , n along the periphery 
of a circle in a counter-clockwise manner, and placing graph edges as chords within the circle. 
An outerplanar graph is a graph whose circular representation is planar; i.e. there are no 
crossings. An RNA secondary structure, formally defined in Section [21 is an outerplanar 
graph (no pseudoknots) with the property that no vertex is incident to more than one edge 
(no base triples) and that for every chord between vertices i,j, there exist at least 6 = 1 many 
vertices that are not incident to any edge (hairpin requirement). RNA secondary structure is 
equivalently defined to be a well-balanced parenthesis expression s\, . . . ,s n with dots, where 
if nucleotide i is unpaired then Sj = •, while if there is a base pair between nucleotides i < j 
then Si = ( and Sj = ) . This latter representation is known as the Vienna representation 
or dot bracket notation (dbn). 

Formally, a well-balanced parenthesis expression w\ ■ ■ ■ w n can be defined as follows. If 
X denotes a finite alphabet, and a £ X, and w = W\ ■ ■ ■ w n £ X* is an arbitrary word, or 
sequence of characters drawn from X, then |u;|q, designates the number of occurrences of a 
in w. Letting £ = {(,)}, a word w = w% ■ ■ ■ w n £ X* is well-balanced if for all 1 < i < 
n, \wi---Wi\^ > \wf-Wi\-) and \w± ■ ■ ■ w n \ ^ = \w\ ■ ■ ■ w n \ ^ . Finally, when considering 
RNA secondary structures, we consider instead the alphabet £ = {(,), •}, but otherwise 
the definition of well-balanced expression remains unchanged. The number of well-balanced 
parenthesis expressions of length n over the alphabet X = {(,)} is known as the Catalan 
number C n , while that over the alphabet £ = {(,),•} is known as the Motzkin number 
M n [4]. Stein and Waterman [19] computed the number S n of well-balanced parenthesis 
expressions in the alphabet X = { ( , ) , • }, where there exist at least 6 = 1 occurrences of 
• between corresponding left and right parentheses ( respectively ) . It follows that S n is 
exactly the number of RNA secondary structures on [l,n], where there exist at least 6=1 
unpaired bases in every hairpin loop. 

In this paper, we are interested in specific classes of secondary structure: canonical and 
saturated structures. A secondary structure is canonical pQ if it has no lonely (isolated) 
base pairs. A secondary structure is saturated [22] if no base pairs can be added without 
violating the notion of secondary structure, formally defined in Section [2 In order to compute 
parameters like asymptotic value for number of structures, expected number of base pairs, 
etc. throughout this paper, we adopt the model of Stein and Waterman [19]. In this model, 
any position (nucleotide, also known as base) can pair with any other position, and every 
hairpin loop must contain at least 6 = 1 unpaired bases; i.e. if i, j are paired, then j — i > 6. 
This latter condition is due to steric constraints for RNA. At the risk of additional effort, 
the combinatorial methods of this paper could be applied to handle the situation of most 
secondary structure software, which set 6 = 3. 

1.1 Examples of secondary structure representations 

Figure [1] gives equivalent views of the secondary structure of 5S ribosomal RNA with GenBank 
accession number NC_000909 of the methane-generating archaebacterium Methanocaldococcus 
jannaschii, as determined by comparative sequence analysis and taken from the 5S Ribosomal 
RNA Database [20] located at |http : //r ose . man . poznan . pl/5SData/| The sequence and its 
secondary structure in (Vienna) dot bracket notation are as follows: 

*We often describe the graph edges of an undirected graph as (i, j), where i < j, rather than {i, j}. 
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UGGUACGGCGGUCAUAGCGGGGGGGCCACACCCGAACCCAUCCCGAACUCGGAAGUUAAGCCCCCCAGCGAUGCCCCGAGUACUGCCAUCUGGCGGGAAAGGGGCGACGCCGCCGGCCAC 

((((• CCCCCCC CCCCCCC (((((( ))))..)) ))))).))...((((( (((((....»»)....)»))...))))))))))). 



Equivalent representations for the same secondary structure may be produced by software 
jViz [21], as depicted in Figure [TJ The left panel of this figure depicts the circular Feynman 
diagram (i.e. outerplanar graph representation), the middle panel depicts the linear Feynman 
diagram, and the right panel depicts the classical representation. This latter representation, 
most familiar to biologists, may also be obtained by RNAplot from the Vienna RNA Package 
0. 
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Figure 1: Depiction of 5S ribosomal RNA from M. Jannaschii with GenBank accession num- 
ber NC_000909. Equivalent representations as (Left) outerplanar graph (also called Feynman 
circular diagram), (Middle) Feynman linear diagram, (Right) classical diagram (most familiar 
to biologists). The sequence and secondary structure were taken from the 5S Ribosomal RNA 
Database [20], and the graph was created using jViz [21]. 

1.2 Outline and results of the paper 

In Section [21 we review a combinatorial method, known as the DSV methodology and the 
important Flajolet-Odlyzko Theorem, which allows one to obtain asymptotic values of Taylor 
coefficients of analytic generating functions f(z) = X^Si aiZ% by determining the dominant 
singularity of /. The description of the DSV methodology and Flajolet-Odlyzko theorem is 
not meant to be self-contained, although we very briefly describe the broad outline. For a 
very clear review of this method, with a number of example applications, please see [12] or 
the recent monograph of Flajolet and Sedgewick [18] . 

In Section [2.11 we compute the asymptotic number 2.1614 • n _3//2 • 1.96798™ of canonical 
secondary structures, obtaining the same value obtained by Hofacker, Schuster and Stadler [9j 
by a different method, known as the Bender-Meir-Moon method. In Section [2.21 we compute 
the expected number 0.31724- n of base pairs in canonical secondary structures. In Section f2. 31 
we apply the DSV methodology to compute the asymptotic number 1.07427 • ra -3 / 2 • 2.35467™ 
of saturated structures, while in Section 12.41 we compute the expected number 0.337361 • n 
of base pairs of saturated structures. In Section 12.51 we compute the asymptotic number 
0.323954 • 1. 69562™ of saturated stem-loop structures, which is substantially smaller than the 
number 2 n_2 — 1 of (all) stem-loop structures, as computed by Stein and Waterman [19] . 

In Section [3] we consider a natural stochastic process to generate random saturated struc- 
tures, called in the sequel quasi-random saturated structures. The stochastic process adds 



3 



base pairs, one at a time, according to the uniform distribution, without violating any of the 
constraints of a structure. The main result of this section is that asymptotically, the expected 
number of base pairs in quasi-random saturated structures is 0.340633 • n, rather close to the 
expected number 0.337361 • n of base pairs of saturated structures. The numerical proximity 
of these two values suggests that stochastic greedy methods might find application in other 
areas of random graph theory. In Section U] we provide some concluding remarks. 

At the web site |http : //bioinf ormatics . be . edu/clotelab/SUPPLEMENTS/JBCBasymptotics/ 
we have placed Python programs and Mathematica code used in computing and checking the 
asymptotic number of canonical and saturated secondary structures, as well as the Maple 
code for checking Drmota's [6] conditions to deduce the asymptotic normality of the density 
of states of RNA structures. 

2 DSV methodology 

In this section, we describe a combinatorial method sometimes called DSV methodology, after 
Delest, Schiitzenberger and Viennot, which is a special case of what is called the symbolic 
method in combinatorics, described at length in [18]. See also the Appendix of [12] for a 
detailed presentation of this method. This method enables one to obtain information on the 
number of combinatorial configurations defined by finite rules, for any size. This is done 
by translating those rules into equations satisfied by various generating functions. A second 
step is to extract asymptotic expansions from these equations. This is done by studying the 
singularities of these generating functions viewed as analytic functions. 

Since our goal is to derive asymptotic numbers of structures, following standard convention 
we define an RNA secondary structure on a length n sequence to be a set of ordered pairs 
such that 1 < i < j < n and the following are satisfied. 

1. Nonexistence of pseudoknots: If and (k,£) belong to S, then it is not the case that 
i < k < j < i. 

2. No base triples: If (i,j) and (i,k) belong to S, then j = k; if and (k,j) belong to 
S, then i = k. 

3. Threshold requirement: If (i,j) belongs to S, then j — i > 9, where 9, generally taken 
to be equal to 3, is the minimum number of unpaired bases in a hairpin loop; i.e. there 
must be at least 9 unpaired bases in a hairpin loop. 

Note that the definition of secondary structure does not mention nucleotide identity - i.e. we 
do not require base-paired positions to be occupied by Watson-Crick or wobble pairs. 
For this reason, at times we may say that S is a secondary structure on [l,n], rather than 
saying that S is a structure for RNA sequence of length n. In particular, an expression 
such as "the asymptotic number of structures is f(n)" means that the asymptotic number of 
structures on [1, n] is f(n). 

Grammars 

We now proceed with basic definitions related to context-free grammars. If A is a finite 
alphabet, then A* denotes the set of all finite sequences (called words) of characters drawn 
from A. Let £ be the set consisting of the symbols for left parenthesis ( , right parenthesis 
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) , and dot •, used to represent a secondary structure in Vienna notation. A context-free 
grammar (see, e.g., [11]) for RNA secondary structures is given by G = (V, So), where 
V is a finite set of nonterminal symbols (also called variables), E = {•, ( , ) }, Sq £ V is the 
start nonterminal, and 

Kcyx(Fus)' 

is a finite set of production rules. Elements of 1Z are usually denoted by A — > w, rather than 
(A,w). If rules A — ► ai,. . . , A — > a m all have the same left-hand side, then this is usually 
abbreviated by A — > ai| ■ ■ ■ |a m . 

If x, y G (V U £)* and ^4 — ► w is a rule, then by replacing the occurrence of A in xAy we 
obtain xwy. Such a derivation in one step is denoted by xAy xwy, while the reflexive, 
transitive closure of is denoted =^q- The language generated by context-free grammar G 
is denoted by L(G), and defined by 

L(G) = {w £ S* : S =^ w}. 

For any nonterminal 5 £ V, we also write L(5) to denote the language generated by rules 
from G when using start symbol 5. A derivation of word w from start symbol So using 
grammar G is a leftmost derivation, if each successive rule application is applied to replace 
the leftmost nonterminal occurring in the intermediate expression. A context-free grammar G 
is non- ambiguous, if there is no word w € L(G) which admits two distinct leftmost derivations. 
This notion is important since it is only when applied to non-ambiguous grammars that the 
DSV methodology leads to exact counts. 

For the sake of readers unfamiliar with context-free grammars, we present some examples 
to illustrate the previous concepts. Consider the following grammar G, which generates 
the collection of well-balanced parenthesis strings, including the empty string Define G = 
(V,Ti,R,S), where the set V of variables (also known as nonterminals) is {5}, the set £ of 
terminals is { ( , ) }, where S is the start symbol, and where the set R of rules is given by 

S-> e| (S) 1 55 

Here e denotes the empty string. We claim that G is an ambiguous grammar. Indeed, 
consider the following two leftmost derivations, where we denote the order of rule applications 
rl := 5 — > e, r2 := 5 — ► 55, r3 := 5 — > (5) , by placing the rule designator under the 
arrow. Clearly the leftmost derivation 

5 r2 55 r2 555 r3,rl ( ) 55 r3,rl ( ) ( ) 5 r3,rl ( ) ( ) ( ) 

is distinct from the leftmost derivation 

SaSSrVn ()5^ () (5)5^i C) C)5^ () () (5) * () () () 

yet both generate the same well-balanced parenthesis string. For the same reason, the gram- 
mar with rules 

5^ .|.5| (5)|55 

well-balanced parenthesis string is a word over E = {(, )} with as many closing parentheses as opening 
ones and such that when reading the word from left to right, the number of opening parentheses read is always 
at least as large as the number of closing parentheses. RNA secondary structures can be considered to be 
well-balanced parenthesis strings that also contain possible occurrences of • , and for which there exist at least 
9 occurrences of • between corresponding left and right parentheses ( respectively ) . 
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Type of nonterminal 


Equation for the g.f. 


S^T\ U 

S^TU 

S->i 


S(z) = T(z) + U(z) 
S{z) = T(z)U(z) 
S{z) = z 
S(z) = 1 



Table 1: Translation between context-free grammars and generating functions. Here, G = 
(V,T,,TZ, Sq) is a given context-free grammar, S, T and U are any nonterminal symbols in 
V, and t is a terminal symbol in S. The generating functions for the languages L(S), L(T), 
L(U) are respectively denoted by S(z), T(z), U(z). 



generates precisely the collection of non-empty RNA secondary structures, yet this grammar 
is ambiguous, and we would obtain an overcount by applying the DSV methodology. In 
contrast, the grammar whose rules are 

•{•SI (S)| (S)S 

is easily seen to be non-ambiguous and to generate all non-empty RNA secondary structures. 



Generating Functions 

Suppose that G = (V, S,7£, S) is a non-ambiguous context-free grammar which generates a 
collection L(S) of objects (e.g. canonical secondary structures). To this grammar is associated 
a generating function S(z) = J2^=o s nZ n , such that the nth Taylor coefficient [z Tl ]5(z) = s n 
represents the number of objects we wish to count. In the sequel, s n will represent the 
number of canonical secondary structures for RNA sequences of length n. The DSV method 
uses Table Q] in order to translate the grammar rules of 1Z into a system of equations for the 
generating functions. 



Asymptotics 

In the sequel, we often compute the asymptotic value of the Taylor coefficients of generating 
functions by first applying the DSV methodology, then using a simple corollary of a result of 
Flajolet and Odlyzko [7j. That corollary is restated here as the following theorem. 

Theorem 1 (Flajolet and Odlyzko) Assume that S(z) has a singularity at z = p > 0, is 
analytic in the rest of the region A\l, depicted in Figure^ and that as z — > p in A, 



S(z)~K(l-z/p) a . (1) 

Then, as n — ► oo, if a 0, 1, 2, 

T(-a) 

It is a consequence of Table [1] that the generating series of context-free grammars are algebraic 
(this is the celebrated theorem of Chomsky and Schiitzenberger [2]). In particular this implies 
that they have positive radius of convergence, a finite number of singularities, and their 
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behaviour in the neighborhood of their singularities is of the type ([I]). (See \18\ §VII.6-9] for 
an extensive treatment.) 

A singularity of minimal modulus as in Theorem Q] is called a dominant singularity. The 
location of the dominant singularity may be a source of difficulty The simple case is when 
an explicit expression is obtained for the generating functions; this happens for canonical 
secondary structures. The situation when only the system of polynomial equations is available 
is more involved; we show how to deal with it in the case of saturated structures. 



x 




Figure 2: The shaded region A where, except at z = p, the generating function S(z) must be 
analytic. 

2.1 Asymptotic number of canonical secondary structures 

In Bompfunewerer et al. [I], the notion of canonical secondary structure S is defined as a 
secondary structure having no lonely (isolated) base pairs; i.e. formally, there are no base 
pairs £ S for which both (i — l,j + 1) 5 and {i + 1, j — 1) S. In this section, we 
compute the asymptotic number of canonical secondary structures. Throughout this section, 
secondary structure is interpreted to mean a secondary structure on an RNA sequence of 
length n, for which each base can pair with any other base (not simply Watson-Crick and 
wobble pairs), and with minimum number 8 of unpaired bases in every hairpin loop set to 
be 1. At the cost of working with more complex expressions, by the same method, one could 
analyze the case when 9 = 3, which is assumed for the software mfold [23] and RNAfold [8j. 

Grammar 

Consider the context-free grammar G = (V, X, 1Z, S), where V consists of nonterminals S, R, 
S consists of the terminals • , ( , ) , S is the start symbol and 1Z consists of the following 
rules: 

S -» a|5« | (i?) \S(R) (2) 
R (• )|(iO|(<S(iO )|(5. ) 

The nonterminal S is intended to generate all nonempty canonical secondary structures. In 
contrast, the nonterminal R is intended to generate all secondary structures which become 
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canonical when surrounded by a closing set of parentheses. We prove by induction on ex- 
pression length that the grammar G is non-ambiguous and generates all nonempty canonical 
secondary structures. 

Define context-free grammar Gr to consist of the collection 1Z of rules from G, defined 
above, with starting nonterminal S, respectively. Formally, 



Let L(G), L(Gr) denote the languages generated respectively by grammars G,Gr. Now 
define languages L\, L2 of nonempty secondary structures with 9 = 1 by 



Note that structures like • • ( • ) and ( • ) ( • ) belong to L\, but not to L2, while 
structures like ( ( • ) ) belong to both L\,L2- Note that any structure S belonging to 
L2 must be of the form ( So ) ; indeed, if S were not of this form, but rather of the form 
either • So or (So ) Si, then by ( S ) would have an outermost lonely pair of parentheses. 

Claim. Li = L(G), L 2 = L{G R ). 

Proof of Claim. Clearly L\ D L(G), L2 D L(Gr), so we show the reverse inclusions by 
induction; i.e. by induction on n, we prove that L\ n S n C L(G) n S n , L 2 n S n C L{Gr) n S n . 

Base case: n = 1. Clearly L(G) nE = {.} = LinE, L{G R ) n £ = = L 2 n S. 
Induction case: Assume that the claim holds for all n < k. 

Subcase 1. Let S be a canonical secondary structure with length |<S| = k > 1. Then either 
(%) 5 = »So, where So G Li, or (2) S = (5o) > where So £ L2, or (3) S = (5o)5i, where 
Sq G -^2 and 5i € L\. Each of these cases corresponds to a different rule having left side S, 
hence by the induction hypothesis, it follows that S £ L(G). 

Subcase 2. Let S G L2 be a secondary structure with length |S| = > 1, for which (S) is 
canonical. If 5 were of the form »So or (So) S\, then (S) would not be canonical, since 
its outermost parenthesis pair would be a lonely pair. Thus S is of the form ( So ) , where 
either (1) So begins with • , or (2) So is of the form (Si ) , where Si is not canonical, but 
(Si ) becomes canonical, or (3) Sq is of the form (Si ) , where Si is canonical and (Si ) is 
canonical as well. 

In case (1), So is either • or »Si, where Si is canonical. In case (2), So is of the form 
(Si ) , where Si must have the property that (Si ) is canonical. In case (3), So is of the 
form (Si ) S2, where it must be that (Si ) is canonical and S2 is canonical. By applying 
corresponding rules and the induction hypothesis, it follows that S G L{Gr). 

It now follows by induction that L\ = L(G), L2 = L(Gr). A similar proof by induction 
shows that the grammar G is non-ambiguous. 

Generating Functions 

Now, let s n denote the number of canonical secondary structures on a length n RNA sequence. 
Then s n is the nth Taylor coefficient of the generating function S(z) = X]n>o s « z " > denoted 



Gr 



(v,x,n,R). 



Li 
L 2 



{S : S is canonical} 
{S : (S) is canonical}. 
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by s n = \z n \S{z). Similarly, let R(z) = Yl n >o RnZ n be the generating function for the number 
of secondary structures on [l,n] with 0=1, which become canonical when surrounded by a 
closing set of parentheses. 

By Table [H the non-ambiguous grammar ([2]) gives the following equations 

S(z) = z + S(z)z + R(z)z 2 + S(z)R(z)z 2 (3) 
R(z) = z 3 + R(z)z 2 + S(z)R(z)z A + S(z)z 3 (4) 

which can be solved explicitly (solve the second equation for it! and inject this in the first 
equation) : 



and 



s W = '-'- j2+ ; 3 j r 5 + ^ (6 ) 

where 

F(z) = 4z^(-l + z 2 -z 4 ) + (-l + z + z 2 -z 3 + z 5 ) 2 . (7) 

When evaluated at z = 0, Equation © gives lim r ^o S(z) = oo. Since S(z) is known to be 
analytic at 0, we conclude that S(z) is given by ((5J). 

Location of the dominant singularity 

The square root function yfz has a singularity at z = 0, so we are led to investigate the roots 
of F(z). A numerical computation with Mathematica™ gives the 10 roots 0.508136, 4.11674, 
-0.868214-0.619448i, -0.868214+0.619448i, -0.799805 -0.367046i, -0.799805 + 0.367046i, 
0.410134 - 0.564104i, 0.410134 + 0.564104i, 0.945448 - 0.470929i, 0.945448 + 0.470929i. It 
follows that p = 0.508136 is the root of F(z) having smallest (complex) modulus. 

Asymptotics 

Let T(z) = i-z-zy-* 5 an d f acto r 1 - z/p out of F(z) to obtain Q(z)(l - z/p) = F{z). It 
follows that 



S(z) - T(p) = ^}fl . (i _ z j pT + (1 - z/p), 



VW) 

where a = 1/2. This shows that p is indeed a dominant singularity for S. Note that for 
each n > 1, S(z) and S(z) — T(p) have the SctniG Taylor coGfficiGnt of ind.Gx ti, iiciniGly s n . 
Now, it is a direct consequence of Theorem [T] that 

s n ^^L. n - a - 1 -(l/pr, rwoo (8) 
r(-a) 



where a = 1/2 and K(z) = ^§P. Plug ging p = 0.508136 into equation flSJ), we derive the 
following theorem, first obtained by Hofacker, Schuster and Stadler [9] by a different method. 

Theorem 2 The asymptotic number of canonical secondary structures on [l,n] is 

2.1614 • n~ 3/2 • 1.96798™. (9) 
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2.2 Asymptotic expected number of base pairs in canonical structures 

In this section, we derive the expected number of base pairs in canonical secondary structures 
on [1, n]. 

Generating Functions 

The DSV methodology is actually able to produce multivariate generating series. Modifying 
the equations (|3|4|) by adding a new variable u, intended to count the number of base pairs, 
we get 

S(z,u) = z + S(z,u)z + R(z,u)uz 2 + S(z,u)R(z,u)uz 2 (10) 
R(z,u) = uz 3 + R(z,u)uz 2 + S(z,u)R(z,u)u 2 z 4 + S(z,u)uz 3 . (11) 

This can be solved as before to yield the solution^ 

S(Z,U) = ^ ^ S nik Z n U k 
n>0fc>0 

= 2u 2 z A (l - z - uz 2 + uz 3 - u 2 z 5 - 



4u 2 z 5 (—1 + uz 2 — u 2 z A ) + (—1 + z + uz 2 — uz 3 + u 2 z 5 ) 2 



Here, the coefficient s n) k is the number of canonical secondary structures of size n with k 
base pairs. Using a classical observation on multivariate generating functions, we recover the 
expected number of base pairs in a canonical secondary structure on [1, n] using the partial 
derivative of S(z, u); indeed, 

[z»]™£p. (z, 1) [*1 (E,>o Ek>o SiM'ku*- 1 ) (z, 1) 



[z-]S(z,l) 



and s nt k/s n is the (uniform) probability that a canonical secondary structure on [l,n] has 
exactly k base pairs. 

We compute that G(z) = 9S q Z ^ (z, 1) satisfies 



G(z) 



du 

(z 2 -2){T(z) - ^F[z) + zVW)) 



2z±^/F{z) 

where T(z) = (1 - 2z + 2z 3 - z 4 - 3z 5 + z 6 ) and F(z) is as in (JT]). Simplification yields 

-(z 2 -2)(z-l) T(z)(z 2 -2) ( 1 \ 



G(z) 



2z* 2z* \y/F(} 



'"Since S(z,u) is known to be analytic at 0, we have discarded one of the two solutions as before. 
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Asymptotics 



From this expression, it is clear that the dominant singularity is again located at the same p 
0.508136. A local expansion there gives 



G(z)^K(p){l-z/ P r 1 l\ z 



P 



with K(z) = — 0^- — 1 1'^^ — — . By Theorem [H we obtain the asymptotic value 



2z 4 

K(p) 



r(-a) 



n" 3/2 • (l/ P ) n . (12) 



Plugging p = 0.508136 into equation (112p . we find the asymptotic value of [z n ] 9S q^ (z, 1) is 

0.68568 • n^ 1 / 2 • 1.96798™. (13) 

Dividing (JT2J) by the asymptotic number [z n ]S'(z) of canonical secondary structures, given in 
([9]), we have the following theorem. 

Theorem 3 The asymptotic expected number of base pairs in canonical secondary structures 
is 0.31724 • n. 



2.3 Asymptotic number of saturated structures 

An RNA secondary structure is saturated if it is not possible to add any base pair without 
violating the definition of secondary structures. If one models the folding of an RNA secondary 
structure as a random walk on a Markov chain (i.e. by the Metropolis-Hastings algorithm), 
then saturated structures correspond to kinetic traps with respect to the Nussinov energy 
model [15]. The asymptotic number of saturated structures was determined in [3] by using a 
method known as Bender's Theorem, as rectified by Meir and Moon [T3]. In this section, we 
apply the DSV methodology to obtain the same asymptotic limit, and in the next section we 
obtain the expected number of base pairs of saturated structures. 



Grammar 

Consider the context-free grammar with nonterminal symbols S, R, terminal symbols •, ( , ) , 
start symbol S and production rules 

S •|««|i2«|12««|C5)|5C5) (14) 
R -» (S)|.R(S) (15) 

It can be shown by induction on expression length that L(S) is the set of saturated structures, 
and L(R) is the set of saturated structures with no visible position; i.e. external to every base 
pair [3]. Here, position i is visible in a secondary structure T if it is external to every base 
pair of T; i.e. for all (x, y) G T, i < x or i > y. 
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Generating Functions 

Let 

oo oo 

S(z) = Y,Si-z\ R{z)=Y j n-z i (16) 

i=0 i=0 

denote the generating functions S resp. R, corresponding to the problems of counting number 
of saturated secondary structures resp. number of saturated structures having no visible 
positions. Applying Table [Qwe are led to the equations 

S = z + z 2 + zR + z 2 R + z 2 S + z 2 S 2 (17) 
R = z 2 S + z 2 RS. (18) 



Location of the dominant singularity 

By first solving (fTBj) for R and injecting in (fTTJ) . we get 

S = z + z 2 + z 2 S + z 2 S 2 + (z + z 2 )- ^, (19) 

1 — z z b 

which upon normalizing gives a polynomial equation of the third degree 

P(z, S) = -S 3 z 4 + z(l + z)- S 2 z 2 (-2 + z 2 ) + S (-1 + z 2 ) = 0. (20) 

Unlike earlier work in this paper, direct solution of this equation by Cardano's formulas gives 
expressions that are difficult to handle. Instead, we locate the singularity by appealing to 
general techniques for implicit generating functions [18} §VII.4]. 

By the implicit function theorem, singularities of P(z, S) only occur when both P and its 
partial derivative 

^-(z, S) = -1 + (1 + AS)z 2 - 5(2 + 3S)z i (21) 
oS 

vanish simultaneously. 

The common roots of P and dP/dS can be located by eliminating S between those two 
equations, for instance using the classical theory of resultants (see, e.g., [IQ]). This gives a 
polynomial 

Q{z) = z u (l + z){4 + z - 7z 2 - 28z 3 - 32z 4 + 4z 6 ), (22) 

that vanishes at all z such that (z, S) is a common root of P and dP/dS. 

Numerical computation of the roots of Q yields 0, -1, -2.29493, -0.854537, -0.244657- 
0.560H, -0.244657 + 0.560K, 0.424687, 3.2141. 

A subtle difficulty now lies in selecting among those points the dominant singularity of the 
analytic continuation of the solution 5 of (|19|) corresponding to the combinatorial problem. 
Indeed, it is possible that one solution of (I19p is singular at a given r without the solution of 
interest being singular there. Considering such a singularity would result in an asymptotic 
expansion that is wrong by an exponential factor. One way to select the correct singularity 
is to apply a result by Meir and Moon [13] to Equation (|19p . This results in a variant of the 
computation in [3]. 

Instead, we use Pringsheim's theorem (see, e.g., |18j). 
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Theorem 4 (Pringsheim) If S(z) has a series expansion at that has nonnegative coeffi- 
cients and a radius of convergence R, then the point z = R is a singularity of S(z). 

In our example, there are only two possible real positive singularities, 0.424687 and 3.2141. 
The latter cannot be dominant, since it would lead to asymptotics of the form 3.2141 - ™, 
i.e., an exponentially decreasing number of structures. Thus the dominant singularity is at 
p = 0.424687. Since the moduli of the non-real roots of Q is 0.611203 > p, the conditions of 
Theorem Q] hold, provided the function behaves as required as z — > p. 

Asymptotics 

We now compute the local expansion of S(z) at p. From equation (|2ip . we have that 

P(p, S) = 0.605047 - 0.8196415 + 0.328189S 12 - 0.0325295S 13 (23) 

whose (numerical approximations of) roots are the double root S = 1.6569 and single root 
S = 6.77518. It is easily checked that 1.6569 is the only root of equation (J2"5j) in which P(p, S) 
is increasing; thus we let T = 1.6569. 

Recall Taylor's theorem in two variables 

ft ) = V V d n+k f(x Q ,y ) (x - x ) n (y - y ) k 
J(x,y) 2^2^ dx n dy k n! ' kl 

n=0 fc=0 y 

We now expand P(z, S) at z = p and S = T and invert this expansion. This yields 

ftp pip i #p 

P(z, S) = P(p, T) + ^(p, T)(S -T) + ^-(p, T)(z -p) + -^_(p,T)(S-Tf + ... (24) 

where the dots indicate terms of higher order. The first two terms are 0, so by denoting 
P z = |f (p,T) and P SS = 0(P,T), we have 

= P = P z (z - p) + lp tg (S - Tf + 0(S - Tf + 0((z -p)(S- Tf) + 0((z - p) 2 ). (25) 

Isolating (S — T) 2 we get 

(5 _T)2 = ~ 2P *<f ZA + ((z - p) 2 ) + 0((S - Tf) 
Pss 

2pK 



S-T = ±^^-l.^/T^/p + 0{z-p). 

Since [z n ]S'(2;) is the number of saturated secondary structures on [l,n] and the Taylor co- 
efficients in the expansion of Wl — z/p are negative, we discard the positive root and thus 
obtain 

S ~ T = -{^§- V^Tp + 0(z- P ). (26) 

We now make use of Theorem [1] as before and recover the following result, proved earlier 
in [3] by the Bender-Meir-Moon method. 

Theorem 5 The asymptotic number of saturated structures is 1.07427 • ra~ 3 / 2 • 2.35468 n . 
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2.4 Expected number of base pairs of saturated structures 

In this section, we compute the expected number of base pairs of saturated structures, pro- 
ceeding as in Section 12.21 by first modifying the equations to obtain bivariate generating 
functions and then differentiating with respect to the new variable and evaluating at 1 to 
obtain the asymptotic expectation. 

Generating Functions 

We first modify equations ([17I18P by introducing the auxiliary variable u, responsible for 
counting the number of base pairs: 

S = z + z 2 + zR + z 2 R + uz 2 S + uz 2 S 2 (27) 
R = uz 2 S + uz 2 RS. (28) 

Solving the second equation for R and injecting into the first one gives 

P(z, u, S) = Suz 2 {z + z 2 ) - (-1 + Suz 2 )(-S + z + z 2 + Suz 2 + S 2 uz 2 ). (29) 

Asymptotics 

We are interested in the coefficients of dS/du at u = 1. Differentiating f|29[) with respect to u 
gives 

dp_ apas _ 

du dS du 



Using equation ([26]) . we replace S(z, 1) by T + Ky/l — zjp + 0(1 — zjp) in this equation to 
obtain 



! T(1 + 2(1 - p 2 )T - 2p 2 T 2 ) + 0(^1- z/p)) + 



and finally 



(4Kp 2 - 2Kp* - QKp A T)^/T^zTp + 0{l - z/p)) — 



dS , . 0.642305 



= 

u=l 



du y/i - zjp 

Applying Theorem [1] to equation ([30]) gives 

,»[,"]f(M)^.^^o,a 2W .„-. 

It follows that the asymptotic expected number of base pairs in saturated structures on [l,n] 
is 



[^^mr 1 ^, 1) , 0.362417 • n" 1 / 2 • p~ n 
[z n ]S(z,l) 1.07427 -n- 3 / 2 -p- n 

We have just proved the following. 



0.337361 • n 
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Theorem 6 The asymptotic expected number of base pairs for saturated structures is 0.337361- 
n. 

Since the Taylor coefficient s ni fc of generating function S(z, u) = J2 n k s n,kZ n u k is equal to the 
number of saturated structures having k base pairs, it is possible that the methods of this 
section will suffice to solve the following open problem. 

Open Problem 1 Clearly, the maximum number of base pairs in a saturated structure on 
[l,n] where 9 = 1 is \ ^-^-\ ■ For fixed values ofk, what is the asymptotic number s n , L (n-i)/2j-fc 
of saturated secondary structures having exactly k base pairs fewer than the maximum? 

Note that in [3], we solved this problem for k = 0, 1. 

A related interesting question concerns whether the number of secondary structures s n & 
having k base pairs is approximately Gaussian. As first suggested by Y. Ponty (personal com- 
munication), this is indeed the case. More formally, consider for fixed n the the finite distribu- 
tion F n = p\, . . . ,p n , where pk = Sn,fc/s n and s n = s Ut k- In the Nussinov energy model, the 
energy of a secondary structure with k base pairs is —k, so the distribution P n is what is usu- 
ally called the density of states in physical chemistry. It follows from Theorem 1 of of Drmota 
[6] (see also [5]) that P n is Gaussian. Similarly, it follows from Theorem 1 of Drmota that the 
asymptotic distribution of density of states of both canonical and saturated structures is Gaus- 
sian. Details of a Maple session applying Drmota's theorem to saturated structures appears in 
the web supplement |http : //bioi nformat ics .be . edu/clotelab/ SUPPLEMENTS/JBCBasympto tics/ 

2.5 Asymptotic number of saturated stem-loops 

Define a stem-loop to be a secondary structure S having a unique base pair (io,jo) £ S, for 
which all other base pairs (i,j) £ S satisfy the relation i < i$ < jo < j. In this case, (io> Jo) 
defines a hairpin, and the remaining base pairs, as well as possible internal loops and bulges, 
constitute the stem. We have the following simple result due to Stein and Waterman |19j . 

Proposition 1 There are 2 n ~ 2 — 1 stem-loop structure^ on [1, n]. 

Proof. Let L(n) denote the number of secondary structures with at most one loop on 
(1, . . . , n). Then L(l) = 1 = L(2). There are two cases to consider for L{n + 1). 

Case 1. If n + 1 does not form a base pair, then we have a contribution of L(n). 

Case 2. n + 1 forms a base pair with some 1 < j < n — 1. In this case, since only one hairpin 
loop is allowed, there is no base-pairing for the subsequence s±, . . . , Sj_i, and hence if n + 1 
base-pairs with j, then we have a contribution of L{n — (j + 1) + 1) = L{n — j). Hence 

n-l 

L(n + 1) = L(n) + Y,L{n-j) 

= L(n) + L(n - 1) + • • • + L(l) 

and hence L{\) = 1, L(2) = 1, L(3) = 2, and from there L(n) = 2 n ~ 2 by induction. ■ 

^In [19], stem-loop structures are called hairpins. Since the appearance of [19], common convention is that 
a hairpin is a structure consisting of a single base pair enclosing a loop region; i.e. (•■■•). Here we use 
the more proper term stem-loop. 
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We now compute the asymptotic number of saturated stem-loop structures. Let h{n) be the 
number of saturated stem-loops on [1, n], defined by h{n) = 1 for n = 0, 1, 2, 3, /i(4) = 3, and 

h(n) = h(n - 2) + 2h(n - 3) + 2h(n - 4) (30) 

for n > 5. Note that we have defined h(l) = 1 = h(2) for notational ease in the sequel, 
although there are in fact no stem- loops of size 1 or 2. Indeed in this case, the only structures 
of size 1 respectively 2 are • and • • . 

The first few terms in the sequence h(l), h(2), h(3), ■ ■ ■ are 1, 1, 1, 3, 5, 7, 13, 23, 37, 63, 
109, 183, 309, 527, 893, 1511, 2565, 4351, 7373, 12503; for instance, h(20) = 12503. 

Grammar 

It is easily seen that the following rules 

5-^«|««|CS , )|«CiS , )|««C<S , )|C-S')«|C<S , )«« 

provide for a non-ambiguous context-free grammar to generate all non-empty saturated stem- 
loops. It defines actually a special kind of context-free language, called regular, whose gener- 
ating function is rational. 

Generating Function 

By the DSV methodology, we obtain the functional relation 

R(z) = z + z 2 + R(z)z 2 + 2R(z)z 3 + 2R(z)z 4 
whose solution is the rational function 



where P(z) = z and Q(z) = 1 — z — 2z 
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Asymptotics 

For rational functions, an easy way to compute the asymptotic behaviour of the Taylor coef- 
ficients is to compute a partial fraction decomposition and isolate the dominant part. This is 
equivalent to solving the corresponding linear recurrence. See also \17\ p. 325] or [16., Thm. 
9.2]. 

Partial fraction decomposition yields 

_ Mai) M*2) Mas) 
1 — z/ai 1 - z/a2 1 - z/a3 

where the OjS are the roots of Q and A{z) = —1/Q'(z). It follows by extracting coefficients 
that 

h(n) = ^(ai)^™ + A(a 2 )a 2 n + A(a 3 )a^ n . 

(Note that this is an actual equality valid for all n > and not an asymptotic result). Now, 
the roots of Q are approximately 

ai = 0.5897545, a 2 = -0.294877 - 0.872272i, a 3 = -0.294877 + 0.872272i. 
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Since 1 0,2 1 = 1 03 1 = .9207 > \a±\, it follows that the asymptotic behaviour is given by the term 
in a±. 

We have proved the following theorem. 

Theorem 7 The number h(n) of saturated stem-loops on [l,n] satisfies 

h(n) ~ 0.323954 • 1.69562". (32) 

Convergence of the asymptotic limit in equation (|32p is exponentially fast, so that when 
n = 20, 0.323954 • 1.69562™ = 12504.2, while the exact number of saturated stem-loops on 
[1,20] is h(20) = 12503. 

3 Quasi-random saturated structures 

In this section, we define a stochastic greedy process to generate random saturated structures, 
technically denoted quasi-random saturated structures. Our main result is that the expected 
number of base pairs in quasi-random saturated structures is 0.0.340633 • n, just slightly more 
than the expected number 0.337361 • n of all saturated structures. This suggests that the 
introduction of stochastic greedy algorithms and their asymptotic analysis may prove useful 
in other areas of random graph theory. 

Consider the following stochastic process to generate a saturated structure. Suppose that 
n bases are arranged in sequential order on a line. Select the base pair (l,u) by choosing u, 
where 9 + 2 < u < n, at random with probability l/(n — 0— 1). The base pair joining 1 and u 



n 

e — e —. e — 

at least unpaired bases n-k-2 

Figure 3: Base 1 is base-paired by selecting a random base u such there are at least 9 unpaired 
bases enclosed between 1 and u. 

partitions the line into two parts. The left region has k bases strictly between 1 and u, where 
k > 9, and the right region contains the remaining n — k — 2 bases properly contained within 
endpoints k + 2 and n (see Figure [3|). Proceed recursively on each of the two parts. Observe 
that the secondary structures produced by our stochastic process will always base pair with 
the leftmost available base, and that the resulting structure is always saturated. 

Before proceeding further, we note that the probability that the probability pij that (i,j) 
is a base pair in a saturated structure is not the same as the probability qij that is a 
base pair in a quasi-random saturated structure. Indeed, if we consider saturated and quasi- 
random saturated structures on an RNA sequence of length n = 10, then clearly pi 5 = 1/29 
while clearly q\ 5 = 1/80 Despite the very different base pairing probabilities when comparing 

^The web supplement contains a Python program to compute the number of saturated structures on n. 
Clearly pi t s = s ^' a ^ , where denotes the number of saturated structures on an RNA sequence of length k. 
A computation from a Python program (see web supplement) shows that S3 = 1, S5 = 5 and sio = 145, hence 
pi,s = 5/145 = 1/29. 
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saturated with quasi-random saturated structures, it is remarkable that the expected number 
of base pairs over saturated and quasi-random saturated structures is numerically so close. 

Let U® be the expected number of base pairs of the saturated secondary structure gener- 
ated by this recursive procedure. In general, we have the following recursive equation 



n-2 

U ° = 1+ n _ g _i D^ + ^)' n>6 + 2, (33) 

k=e 

with initial conditions 

tf* = £7* = ... = £# fl = 0, U 9 e+2 = U e e+3 = l. (34) 
If we write equation (|33|) for U® , 1 and substitute in it the value for U® we derive 

1 n— 1 



1 / n " 2 
1 + ^ U n-1 + + E^ + U n-k~2) 

V fc=e 



n — 9 \ J n — 9 

If we multiply out by n — and simplify we obtain 

(n - = l + ( n - - l)f/ n e + + t/^, (35) 

which is valid for n > 8 + 1. 

3.1 Asymptotic behavior 

We now look at asymptotics. In particular we prove the following result. 

Theorem 8 Let U® denote the expected number of base pairs for quasi-random saturated 
structures of an RNA sequence of length n. Then for fixed 9, and as n — > oo 

-l-He+i f f ,t+(t+t 2 /2+-+t B + 1 /(e+l)) 



UZ~K -n with K e = e- L - H ^ / e t+(t+ty*+- + r^ /[ v+i )) ^ (3g) 

Jo 

where Hq + i = 1 + | + • • • + is the (9 + l)i/i harmonic number. 

The first few values can easily be obtained numerically and we have 

Kx = 0.340633, K 2 = 0.285497, K 3 = 0.247908, K A = 0.220308, K 5 = 0.199018. 

Proof. For fixed integer 9, the recurrence (|35|) is linear with polynomial coefficients. It is 
a classical result that the generating functions of solutions of such recurrences satisfy linear 
differential equations. This is obtained by applying the following rules: if U (z) = J2n>o u nZ n -, 
then 



nu n z n = zU'(z), Y u n+kZ n = \{U{'< 

n>0 n>0 



Uq-UiZ U k _iZ k l ) 
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Starting from ()35j) . we first shift the index by 9 + 1 and apply these rules together with the 
initial conditions (f34"j) to get 



(n + d + 2)U e n+e+2 -(9 + l)Ui +e+2 = l + ( n + e + l)U e n+e+l -{9 + l)U e n+e+1 + U e n+e + U e n , 

1 V-(^ + i)^ = T 1 I + ^ T V-(^ + i)^ T + f + y. 



z e+2 



Finally, this simplifies to 

Q j Q 

z(l - z)y' + ((9 + l){z -l)-z 2 - z e+2 )y = 1— . (37) 

This is a first order non-homogeneous linear differential equation. The homogeneous part 

z(l - z)W + {(9 + l)(z - 1) - z 2 - z 9+2 )W = 

is solved by integrating a partial fraction decomposition 

W'{z) _9 + l z z e+l 
W(z) ~ ~z z-l ~ z - 1 



+ 12, , 

+ 7-1- (1+Z + --- + Z 6 



z z — 1 

log W = (9 + 1) log z - 2 log(l -z)-z-(z + z 2 /2 + ■■■ + z e+1 /{9 + 1)), 



W(z) 



Q I -y 

j! -£-(2 + 2 2 /2+- + 2 S+ 7(0+i)) 



(1-^ 

From there, variation of the constant gives the following expression for the generating function: 

= ^ +1 c - z -( 2+2 .2/2+-+ z 9+V(fl+i)) T e t+(t +i 2/2+...+t s + 1 /(e+i) ^ 
(1 - z ) 2 Jo 

Because the exponential is an entire function, we readily find that the only singularity is 
at z = 1, where y ~ i^/ (1 — z) 2 with if as in the statement of the theorem. The proof is 
completed by the use of Theorem [TJ ■ 

Note that the asymptotic expected number of base pairs in quasi-random saturated struc- 
tures with 9 = 1 is 0.340633 • n, while by Theorem [6] the asymptotic expected number of base 
pairs in saturated structures is 0.337361 • n, just very slightly less. This result points out 
that the stochastic greedy method performs reasonably well in sampling saturated structures, 
although the stochastic process tends not to sample certain (rare) saturated structures having 
a less than average number of base pairs. 

The stochastic process used to construct quasi-random saturated structures iteratively 
base-pairs the leftmost position in each subinterval. One can imaging a more general stochas- 
tic method of constructing saturated structures, described as follows. Generate an initial list 
L of all allowable base pairs with 1 < i < j < n and j > i + 9 + 1. Create a saturated 
structure by repeately picking a base pair from L, adding it to an initially empty structure S, 
then removing from L all base pairs that form a crossing (pseudoknot) with the base pair just 
selected. This ensures that the next time a base pair from L, it can be added to S without 
violating the definition of secondary structure. Iterate this procedure until L is empty to form 
the stochastic saturated structure S. 
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Taking an average over 100 repetitions, we have computed the average number of base 
pairs and the standard deviation for n = 10, 100, 1000. Results are fi = 0.323, a = 0.0604 for 
n = 10, p. = 0.3526, a = 0.0386 for n = 100 and fi = 0.35618, a = 0.0361 for n = 1000. This 
clearly is a different stochastic process than that used for quasi-random saturated structures. 

4 Conclusion 

In this paper we applied the DSV methodology and the Flajolet-Odlyzko theorem to asymp- 
totic enumeration problems concerning canonical and saturated secondary structures. For 
instance, we showed that the expected number of base pairs in canonical RNA secondary 
structures is equal to 0.31724 • n, which is far less than the expected number 0.495917 • n 
of base pairs over all secondary structures, the latter which follows from Theorem 4.19 of 
|9j. This may provide a theoretical explanation for the speed-up observed for Vienna RNA 
Package when restricted to canonical structures [I]. 

Additionally, we computed the asymptotic number 1.07427 • n~ 3 / 2 • 2.35467 n of saturated 
structures, the expected number 0.337361 • n of base pairs of saturated structures and the 
asymptotic number 0.323954-1.69562™ of saturated stem-loop structures. We then considered 
a natural stochastic greedy process to generate quasi-random saturated structures, and showed 
surprisingly that the expected number of base pairs of is 0.340633 • n, a value very close to 
the expected number 0.337361 • n of base pairs of all saturated structures. Finally, we apply 
a theorem of Drmota [6] to show that the density of states for [all resp. canonical resp. 
saturated] secondary structures is asymptotically Gaussian. 
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